Can Subcategorisation Probabilities Help a Statistical Parser

نویسندگان

  • John A. Carroll
  • Guido Minnen
  • Ted Briscoe
چکیده

Research into the automatic acquisition of lexical information from corpora is starting to produce large-scale computational lexicons containing data on the relative frequencies of subcategorisation alternatives for individual verbal predicates. However, the empirical question of whether this type of frequency information can in practice improve the accuracy of a statistical parser has not yet been answered. In this paper we describe an experiment with a widecoverage statistical grammar and parser for English and subcategorisation frequencies acquired from ten million words of text which shows that this information can significantly improve parse accuracy 1 .

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Three Generative, Lexicalised Mode l s for Statistical Parsing

In this paper we first propose a new statistical parsing model, which is a generative model of lexicalised context-free grammar. We then extend the model to include a probabilistic treatment of both subcategorisation and wh-movement. Results on Wall Street Journal text show that the parser performs at 88.1/87.5% constituent precision/recall, an average improvement of 2.3% over (Collins 96).

متن کامل

Three Generative, Lexicalised Models for Statistical Parsing

In this paper we first propose a new statistical parsing model, which is a generative model of lexicalised context-free grammar. We then extend the model to include a probabilistic treatment of both subcategorisation and wh-movement. Results on Wall Street Journal text show that the parser performs at 88.1/87.5% constituent precision/recall, an average improvement of 2.3% over (Collins 96).

متن کامل

Can Subcategorization Help a Statistical Dependency Parser?

Today there is a relatively large body of work on automatic acquisition of lexicosyntactical preferences (subcategorization) from corpora. Various techniques have been developed that not only produce machinereadable subcategorization dictionaries but also they are capable of weighing the various subcategorization frames probabilistically. Clearly there should be a potential to use such weighted...

متن کامل

Learning Subcategorisation Information to Model a Grammar with “Co-restrictions”

This paper describes two different tasks involving the notion of subcategorisation in NLP. First, it presents a specific strategy to acquire both nominal and verbal subcategorisation from text corpora. More precisely, we describe an unsupervised method for extracting syntactic and semantic subcategorisation from partially parsed texts. The second task concerns the usage of subcategorisation inf...

متن کامل

Bootstrapping Statistical Processing Into A Rule-Based Natural Language Parser

This paper describes a "bootstrapping" method which uses a broad-coverage, rule-based parser to compute probabilities while parsing an untagged corpus of NL text, and which then incorporates those probabilities into the processing of the same parser as it analyzes new text. Results are reported which show that this method can significantly improve the speed and accuracy of the parser without re...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره cmp-lg/9806013  شماره 

صفحات  -

تاریخ انتشار 1998